6 research outputs found

    日本語語彙特性のデータベースの構築―その基礎枠組み及び主要中核要素の概観―

    Get PDF
    In order to be able to conduct meaningful research into all aspects of language, it is essential for language science and cognitive science researchers to have practicalaccess to an increasingly wider range of detailed and contemporary information about their target languages. Against that background, this paper presents a short overview summary of an ongoing project to construct a largescale database of Japanese lexical properties (JLP). More specifically, after outlining the concurrent construction of the ontology of Japanese lexical properties (JLP-O; Joyce & Hodošček, 2014), which provides the basic guiding framework for the JLP database construction project, the paper also outlines the initial core components of the JLP database, with particular emphasis on two of those components;namely, a database of semantic transparency (ST) ratings for approximately 10,000 two-kanji compound words and some initial results for the extraction and automatic analyses of the word structures of both three- and fourkanji compound words.言語科学者や認知科学者にとって,言語のあらゆる側面について有意義な研究を企図するためには,目的とする言語に関する詳細かつ現代的な幅広い情報に実用可能なレベルでアクセスできることが必要不可欠である。このことを背景として,本稿では,日本語の語彙特性に関する大規模データベースの構築を目指して現在進行中のプロジェクトについての概要を説明する。具体的には,この日本語語彙特性データベース構築プロジェクトに対して基本的な枠組みを提供する,日本語語彙特性に関するオントロジー(Joyce & Hodošček, 2014)の構築について概観したのちに,日本語語彙特性データベースの主要中核要素について略述する。特に,約10,000 の漢字二字熟語に対する意味的透明性の評定データベースと,漢字三字および四字の熟語の抽出とその語構造に対する自動分析に関する主要な結果という2 種類の中核要素を取り上げて論じる

    Distant Co-occurrence Patterns of Connectives: a Corpus Study of Formulaicity in Japanese

    Get PDF
    Using corpus research methods, this study aims to establish whether there are two-item and, more generally, multi-item distant co-occurrence patterns of connectives in written Japanese, and further, to clarify the role these patterns play in discourse. The study is based on a hybrid corpus of written Japanese including Humanities and social science papers, Science and technology papers, and general written language data. The co-occurrence threshold was set at co-occurrence frequency > 10, PMI value > 2, and Dice coefficient > 0.01. The distribution of the observed co-occurring pairs differed according to the genre. Visualization of the connectivity potential of co-occurring pairs as directed graphs showed that these co-occurring pairs constitute longer co-occurrence chains which can be interpreted as ready-made co-occurrence patterns. Two-item and multi-item co-occurrence patterns are considered a type of Bourdieu’s habitus and contribute to both discourse development and discourse prediction

    borh/jsrs: Improved README; DOI

    No full text
    Japanese Speech Rating Syste

    Podporni sistemi za učenje japonščine: poročilo o projektu Hinoki

    Get PDF
    In this report, we introduce the Hinoki project, which set out to develop web-based Computer-Assisted Language Learning (CALL) systems for Japanese language learners more than a decade ago. Utilizing Natural Language Processing technologies and other linguistic resources, the project has come to encompass three systems, two corpora and many other resources. Beginning with the reading assistance system Asunaro, we describe the construction of Asunaro's multilingual dictionary and it's dependency grammar-based approach to reading assistance. The second system, Natsume, is a writing assistance system that uses large-scale corpora to provide an easy to use collocation search feature that is interesting for it's inclusion of the concept of genre. The final system, Nutmeg, is an extension of Natsume and the Natane learner corpus. It provides automatic correction of learners errors in compositions by using Natsume for its large corpus and genre-aware collocation data and Natane for its data on learner errors.V poročilu predstavljamo projekt Hinoki, ki je bil zastavljen pred več kot desetimi leti za izdelavo spletnih sistemov za računalniško podprto učenje japonščine kot tujega jezika. Z uporabo jezikovnih tehnologij in drugih jezikovnih virov so bili v okviru projekta razviti trije sistemi, dva korpusa in veliko drugih virov. V nadaljevanju predstavljamo sistem Asunaro za podporo branju, izgradnjo njegovega večjezičnega slovarja in pristop k podpori branju, ki sloni na odvisnostni slovnici; sistem za podporo pisanju Natsume s preprostim vmesnikom za iskanje žanrsko določenih kolokacij v obsežnih korpusih; ter sistem Nutmeg za samodejno popravljanje napak. Nutmeg je nadgradnja sistema Natsume in učnega korpusa Natane, ponuja samodejno popravljanje napak med samim pisanjem z uporabo žanrsko določenih kolokacijskih informacij iz obsežnih korpusov preko sistema Natsume in informacij o napakah piscev, ki se učijo japonščine kot tujega jezika, iz korpusa Natane

    Korpus usvajanja tujega jezika za študente japonščine

    Get PDF
    Japanese language learners aim to acquire reading, listening, writing and speaking skills. We at the Hinoki project (https://hinoki-project.org/) have recently been working on the Natsume collocation search system (https://hinoki-project.org/natsume/), the Natane learner corpus to support Natsume (https://hinoki-project.org/natane/) and the Nutmeg writing support system (http://hinoki-project.org/nutmeg/). In order to test the effectiveness of Nutmeg, we conducted an online experiment with 36 participants who used the system's register misuse identification feature to correct four writing assignments. Results show that Nutmeg can be an effective tool in correcting common register-related errors, especially those involving auxiliary verbs. However, the accuracy of verb and adverb identification was too low, suggesting the need for improvements in the variety of corpora used for identifying register misuse.Cilj vsakogar, ki se uči tuj jezik, je, da usvoji branje, slušno razumevanje, pisanje in govorne sposobnosti ciljnega jezika. S projektom Hinoki (https://hinoki-project.org/) si prizadevamo narediti iskalnik kolokacij Natsume (https://hinoki-project.org/natsume/), učni korpus Natane, ki bo podpiral Natsume (https://hinoki-project.org/natane/) in podporni sistem Nutmeg za pisanje (http://hinoki-project.org/nutmeg/). S spletnim eksperimentom, ki je vključeval 39 sodelujočih, smo ocenili učinkovitost sistema Nutmeg. Vsak sodelujoči je s pomočjo uporabe identifikacijskih lastnosti za napačno uporabo jezikovnega registra, ki jih ponuja sistem Nutmeg, popravil štiri pisne naloge. Rezultati kažejo, da je sistem Nutmeg učinkovito orodje za popravljanje splošnih napak, ki so povezane z registrom jezika, še posebej v primerih pomožnih glagolov. Hkrati smo ugotovili, da je prišlo do nepravilnosti pri prepoznavanju glagolov in prislovov, zaradi česar bo potrebno povečati raznolikost korpusov, na katerih prepoznavamo napačno uporabo jezikovnega registra
    corecore